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ABSTRACT 

We  consider  the  nonlinear  differential  equation  rr  =  L(u)  +  f(tj. 

n 
Use  of  Galerkin  FEM  with  u(x,  t)  ~  v(x,  t)  =  E  Y.(t)N.(x),  wnere  the 

j  =  I  J    J  - 

N.(x)  are  specified  basis  functions,  results  in  the  implicit  system  of 

J  "  n 

ordinary  differential  equations,  (*)  S  A.,  y.   -  B- (y,,...,  y     f)  =  0, 

j  =  I  J  J 
i  =1,  ...,  n,  where  A  .  =  <N. ,  N.>,  B.  =  <N.,  L(v)  +  f>. 

The  method  chosen  for  solution  of  stiff  systems  (*)  is  a  version 
of  Gear's  method  which  solves  the  system  in  its  implicit  form.  This 
leads  to  the  necessity  of  being  able  to  solve  (repeatedly)  linear 
algebraic  equations  whose  coefficient  matrix  has  the  same  sparse  and 
banded  nature  as  (A..). 

Storage  requirements  for  various  orders  of  polynomial  triangular 
elements  under  compact  storage  mode,  profile  storage  mode,  and  banded 
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symmetric  storage  mode  are  given  and  compared.   For  large  systems  (*), 
compact  storage  mode  leads  to  significantly  reduced  requirements. 
Consideration  of  the  linear  algebraic  systems  which  arise  in  Gear's 
method  reveals  that  iteration  should  be  computationally  efficient.  A 
comparison  between  various  solution  methods  is  given  for  a  nonlinear 
reactor  dynamics  problem.  Associated  with  each  solution  method  is  a 
different  storage  mode. 

1 .  Discription  of  the  Problem 

We  consider  a  nonlinear  p.d.e.  of  the  parabolic  type, 

|~  =  L(u)  +  f   X  £  D,  t  £  I  (1) 

with  appropriate  initial  and  boundary  conditions,  and  L  denotes  spatial 
operators.   In  accordance  with  a  weighted  residual  FEM  formulation,  an 
approximate  solution  v(x,t)  in  the  form 

n 

"U(X,  t)  S'vfx^t)  =  Z    Yj(t)N.j(x)  U) 

j  =  1 

is  assumed.   In  Eq.  (2),  N.(x)  are  a  set  of  specified  interpolation 
functions  with  local  support,  and  the  y.(t)  are  the  solution  coefficients 
to  be  determined.  Setting  the  residual  function 

R(x,  t)  =f£--  L(v)  -  f  (3) 

orthogonal  to  each  of  the  weighting  functions  W.(x),  i  =  1,  ...,  n, 
i  .e. 

<R,  W.>  =  u,  i  =  1 ,  . . . ,  n  (4) 
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yields  the  system  of  nonlinear  o.d.e., 


1     Aij  YJ  "  Bi(YV 'V  f)  =  °  (5) 


J  =  1 
with  initial  conditions,  where 


A.  .  =  <W.,  N.> 

Bi   =  <L(v)  +  f,  W.> 


(6) 


Out  objective  is  to  select  a  method  of  solution  of  Eq.  (5)  which  is 
efficient  with  respect  to  memory  core  requirement,  and  computational 
effort.  With  regard  to  core  requirement,  an  efficient  strategy  should 
take  into  account  the  nature  of  the  (A..)  matrix.   If  the  weighting 
functions  have  local  support,  A  will  be  sparse  and  banded.   If  a  Galerkin 
formulation  is  employed,  W.  =  N.,  and  the  (A-  •) matrix  is  symmetric.   In 
general  the  sparseness  of  (A..)  increases  with  finer  mesh  discretization, 
as  well  as  space  dimensionality.  Bandwidth  and  sparseness  increase  with 
higher  order  polynomial  interpolation  elements,  but  fewer  are  required  to 
provide  an  accuracy  achieved  by  lower  order  elements.  The  question  of 
which  is  more  efficient,  higher  or  lower  order  elements,  is  not  addressed 
here. 

Attention  is  given  here  to  a  stiff  system  arising  from  a  FEM  formu- 
lation of  a  two  dimensional  nonlinear  nuclear  reactor  dynamics  problem 
[1  ] .  Here  L(u)  is  given  by 

L(u)  =  -au2  +  bu  +  cA2u  (7) 

with  appropriate  initial  and  boundary  conditions.   In  this  case,  Eq. 
(5)  becomes,  after  an  integration  by  parts  on  the  A  term, 

EAfj;.  ♦  a£ICjjkyjYk  -  MAfjYj  ♦  cIBijYj  =  0  (8) 
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where 


A.  .  =  <N.,  N.> 


C.  ..  =  <N.,  N,m  > 


-  3N.   3N. 

B   =  <—  — 1 
cij    3X  '  3x 


3N.   3N. 
>  +  <— L.  _J-> 


ay  '  3y 


(9) 


2.  Solution  Techniques  for  the  Implicit  System  of  O.D.E.'s 

Consideration  of  various  schemes  for  the  solution  of  the  implicit 
system  of  ordinary  differential  equations  given  by  Eq  (5)  reveals  that 
no  matter  what  type  of  scheme  is  employed,  it  will  involve  the  solu- 
tion of  a  system  of  algebraic  equations,  possibly  lonlinear  if  (3.)  is 
nonlinear.  Use  of  even  a  simple  scheme  for  explicit  systems  of  dif- 
ferential equations,  e.g.,  Euler's  method,  requires  repeated  solution 
of  a  system  of  linear  equations  with  coefficient  matrix  (A..)  for  the 
Y-,  given  y,  »  .  ••»  Yn>  and  t.   If  a  predictor-corrector  method  (or 
any  method  involving  derivatives  at  the  new  time,  generally  called  an 
implicit  numerical  method)  for  explicit  systems  of  differential  equa- 
tions is  used,  a  second  system  of  algebraic  equations  arises  for "the 
dependent  variables  at  the  new  time.  Because  of  this  second  system  of 
algebraic  equations  it  is  best  to  avoid  having  to  solve  (5)  for 
derivatives  by  employing  an  ordinary  diffential  equation  solver 
designed  for  implicit  systems  of  equations. 

Given  that  an  implicit  method  will  be  employed  to  solve  the  system 
(5)  there  are  three  levels  of  matrix  storage  that  are  required:   (1) 
That  required  by  the  system  matrices  (A..)  and  (B.)   (We  are  not 
being  specific  here  about  the  form  of  (B-);  it  may  involve  several  con- 
stant matrices  or  may  be  a  function  of  time);  (2)  That  required  by  the 
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,  Y  at  the  next  time  in  the 


differential  equation  solver;  and  (3)  That  required  to  represent  the 
algebraic  system  of  equations  for  y-j  , 
form  required  by  the  algebraic  equation  solver  being  used.  The 
hierarchy  of  storage  levels  is  shown  schematically  in  Figure  1,  along 
with  possible  options  for  differential  equation  solvers  and  algebraic 
equation  solvers,  with  the  preferred  storage  mode  shown  in  parenthesis 
Level  1:  System  Matrices:   (A--),  (Compact) 


Level  2 


ODE  Solver: 


(3.)t   (Compact  or  ?) 

e.g.  Implicit  Gear   (Full, 

Crank-Nicolson    nx?) 

others 


Level  3 


Algebraic  Equation  Solver 


Linear  Case 


Nonl inear  Case 


Direct  Methods    Iterative  Methods   Newton's  Method, 

i 


(Compact) 
(Banded)  (Profile)  (Compact) 


ior  others  leading 

, to  1 inear  systems 

I    (See  Linear) 
L ! 


Iterative 
not  leading 
to  1 inear  systems 
(Compact) 


Figure  1 


It  is  difficult  to  show  all  the  possible  options  and  Figure  1  is  not 
meant  to  exclude  any,  but  rather  to  emphasize  several  points.   (1)  The 
system  matrices  are  used  only  in  evaluation  of  the  left  side  of  (5)  and 
can  be  stored  in  any  form,  some  form  of  compact  storage  being  efficient. 
(2)  The  differential  equation  solver  will  have  its  own  requirements  for 
storing  the  solution  values,  past  history,  and  auxiliary  storage. 
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(3)  The  choice  of  solution  method  for  the  algebraic  equations  will 
determine  the  type  of  storage  required  at  level  3.   In  some  instances 
the  latter  choice  may  be  determined  by  the  differential  equation  solver, 
and  could  require  no  additional  storage  in  some  cases,  but  a  more  usual 
situation  will  be  where  at  least  one  matrix  must  be  stored. 

Because  the  problems  in  which  we  are  interested  are  typically  stiff 
we  were  led  to  Gear's  method,  which  performs  well.  This  method  was 
used  in  a  form  designed  for  implicit  systems  of  differential  equations 
[2],  and  is  based  on  [3].  Gear's  method  is  a  variable  order,  variable 
stepsize,  predictor-corrector  scheme.  The  derivatives  at  the  new  time 
are  approximated  by  a  backwards  difference  formula,  and  the  resulting 
corrector  equation  is  solved  by  a  quasi-Newton's  method.  This  leads 
to  repeated  solution  of  equations  of  the  form  J6y  =  p,  where  the 

solution  6y  represents  incremental  corrections  to  the  solution  values. 

S      3B  • 
For  Eq.  (5),  J  =  (-uA,-,-  -  T"1 ) »  where  h  is  the  current  setpsize  and  s 

w 

is  a  constant  dependent  on  the  current  order  formula  being  applied. 
Our  version   is   designed  to  facilitate  easy  incorporation  of  what- 
ever solution  scheme  and  associated  storage  scheme  is  suitable  for 
these  linear  systems.  Since  the  user  must  supply  a  subprogram  to 
evaluate  the  matrix  J,  it  is  then  relatively  simple  for  the  user 
to  store  the  matrix  in  a  form  compatible  with  the  equation  solver 
being  used. 

In  our  scheme,  the  amount  of  storage  required  at  level  2  is 
approximately  20  n  words.  Level  1  storage  is  dependent  on  the 
problem,  and  level  3  storage  on  the  linear  equation  solver  incor- 
porated into  the  method.  The  details  for  a  specific  problem  are 
discussed  in  Section  4. 
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3.  Storage  Schemes 

The  most  common  method  of  storing  matrices  in  FEM,  is  the  banded 
storage  scheme,  whereby  the  bandwidth  (or  half  bandwidth  in  the  case 
of  symmetric  matrices)  terms  are  stored.  Some  reduction  in  storage  is 
obtained  by  profile  (or  skyline)  storage.   In  this  scheme,  some  of  the 
zero  terms  within  the  band  are  eliminated.  Band  and  profile  storage 
are  schematically  shown  in  Figure  2. 


Symmetric  Banded 


Grid  (R=2,  S=3,  t=2) 


Profile 


Figure  2 
For  large  systems,  the  storage  allocated  to  zero  terms  by  either  the 

band  or  profile  scheme  comprises  a  large  fraction  of  the  total  storage. 
Thus,  a  compact  storage  scheme,  which  stores  only  the  non-zero  coeffi- 
cients of  a  matrix  provides  a  substantial  reduction  in  core  require- 
ments for  large  systems. 

The  implementation  of  compact  storage  requires  two  integer  array 
vectors,  say  ISTART  and  NAME,  and  a  vector  of  the  non  zero  coefficients, 

t  h 

say  AA.  The  i   integer  entry  in  ISTART  is  the  number  q.,  where 


i  -  1 

E 
J  =  1 


V1 


(10) 


,-th 


and  P.  is  the  number  of  terms  in  the  jL"  equation  (i.e.  the  number  of 
nodes  connected  to  the  j   node).   In  n  is  the  number  of  unknowns  in 
the  system,  the  length  of  ISTART  is  (n  +  1).  ISTART  then,  is  a  pointer 
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vector  whose  j   term  locates  the  initial  position  in  the  AA  vector  of 
the  contributing  coefficients  to  the  j   equation.  The  M  x  1  NAME 
vector,  where 

is  composed  of  n  successive  vector  blocks  of  variable  length  P..  The 
P.  integers  in  the  j   block  of  NAME  identify  the  contributors  to  the 

i.  u 

j   equation.  The  M  x  1  vector  AA,  contains  the  real  non-zero  coeffi- 
cients of  the  n  x  n  A  matrix,  arranged  in  the  same  contiguous  block 
arrangement  as  the  NAME  vector. 

A  comparison  of  the  core  requirements  of  a  symmetric  banded  matrix 
using  banded,  profile  and  compact  storage  follows.  To  fix  ideas,  we 
consider  a  simple  rectangular  domain  with  R  rows  of  elements,  and  S 
columns  of  elements.  The  class  of  triangular  elements  with  polynomial 
interpolation  are  considered.  The  formulas  presented  are  for  the  case 
where  interior  nodes  are  condensed  out.  The  following  notation  is  used. 

n   -  the  number  of  unknowns 

n   -  the  half  bandwidth  for  a  symmetric  matrix 

t   -  the  order  of  polynomial  interpolation 

R   -  the  number  of  rows  of  elements  in  the  rectangular  grid 

S   -  the  number  of  columns  of  elements  in  the  rectangular  grid 

N   -  symmetric  band  storage 

N       -  profile  storage 

N   -  compact  storage 

a   -  bytes  per  word  for  real  numbers 

3   -  bytes  per  word  for  integer  numbers 
To  obtain  minimum  bandwidth,  numbering  of  nodes  is  sequential  in  the 
vertical  direction  if  R  <  S,  and  vice  versa  if  S  >  R,  for  profile  and 
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banded  storage.     The  numbering  sequence  for  compact  storage   is   irrele- 
vant.    For  an  R  x  S  rectangular  grid   the  number  of  unknowns   is 

n  =  RS(3t   -   2)   +   (R   +  S)(2   -   2t)   +   (t  -   1)  (12) 

For  very  large  systems,    i.e.    RS  >>   (R  +  S), 

n  :  RS(3t  -  2)  (13) 

The  core  requirement  for  each  of  the  storage  schemes,   for  R  <  S,   is 
a)   banded  storage 


• 


where, 


N     =  anns  (14) 


n$  =  3Rt  -  2R  -  t  +  3  (15) 


For  very  large  systems 


Ns  :  ctR2S  for  t  =  1  (16) 

N     :  16aR2S  for  t  =  2 

s 

N     :  49aR2S  for  t  =  3 

s 


b)     profile  storage 

Np  =  N$   -  aQ  (17) 

where 


Q 


m   (R  -   2)(R  -   1)   t2  +   (R  .   2)(R  -   l)(t  -   l)t   (S   -   1) 


2 


(18) 

(R  -  2)(t-l)t(S  -  1)   +  [2R(t  -   1)   +  H[2R(t   -   1)   +  2](:  _   2) 


+  2  2 


For  very  large  systems 


Np  :  ctR2S 

t   =  1 

Np  :  12aR2S 

t  =  2 

Np  :   35aR2S 

t   =  3 

(19) 
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c)  compact  storage 


N  =  aM  +  S(M  +  n  +  1 ) 


where 


M  =  RS(15t2  -  6t  -  2)  +  (R  +  S)(-14t2  +  8t  +  2) 


+  (13t  -  lOt  -  1) 


For  very   large  systems 


(20) 


(21) 


N  :  RS(7a  +  88)  -  t  =  1 

Nc  :  RS(46a  +  508)  t  =  2 

Nr  :  RS(115a  +  1223)  t  =  3 

Km 


(22) 


d)  Comparison  of  N  ,  N  and  N  for  large  systems. 

It  is  noted  that  banded  and  profile  storage  are  proportional 

2 
to  R  S,  while  compact  storage  is  proportional  to  RS.  The  following 

formulas  compare  the  relative  core  requirements  for  banded,  profile 

and  compact  storage  schemes. 

i)  Savings  of  profile  compared  to  banded  storage 

0.0      t  =  1 


N  -  N   ~ 
_§ £  - 


t  =  2 
t  =  3 


(23) 


ii)     Savings  of  compact  compared  to  profile 


f    ,       7      88 

R  "  aR  t  =  1 


i        23       25e 

fiR  fi«R  t     = 


V 


1       - 


6R       6aR 

23       2228 
7R  "  35aR~ 


t   =  3 


(24) 
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To  fix  ideas,  say  3  =  £a,  then  for  large  systems 


N„ 


1  -  i± 


11 

R 


<     1  -  — 


71 
12R 


1  - 


352 
70R 


t  =  1 
t  =  2 

t  =  3 


(25) 


It  should  be  noted  from  Eq.  (24),  that  banded  and  profile  storage  is 

less  than  compact  storage  for  small  systems.  For  example,  in  the  case 

7   88 
of  t  =  1 ,  banded  and  profile  storage  is  more  efficient  when  1  <  (=■  +  ~), 

i.e.,  when  R  <  11  in  the  case  t  =  1  and  3  =  a/2. 
4.  Numerical  Results  for  an  Example  Problem 

We  consider  the  example  given  by  Eq.  (7),  resulting  in  the  ordinary 
differential  equations  (3).  The  domain  was  a  rectangle  which  was 
discretized  with  11  rows  and  12  columns,  giving  132  nodes  and  220  ele- 
ments. There  were  22  boundary  nodes  (fixed  values).  Using  linear 
triangular  elements,  a  system  of  110  differential  equations  was  obtained. 
The  three  dimensional  array  (C.  ..  )  required  special  consideration  for 

I  J  K 

its  storage.     We  noted  that  n  n       n 

j  =  l     k=l     ^k   J    k        j  =  l     k=j    1Jk    J    K 

where 

j  =  k 


Dijk 


Cijk 


c. ..  +  c,  . 

ljk         lkj 


(26) 


j  <  k 


Because  of  the  regular  rectangular  grid  employed  here,   each 
equation  contains   no  more  than   7  terns.     To  facilitate  handling  of  the 
nonlinear  term,    seven  entries  were  allotted  to  each  block  of  the  NAME 
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array  (i.e.  P.  =  7  for  all  j).   For  equations  with  q  contributing  terms, 
where  q  <  7,  there  were  (7  -  q)  null  entries  in  the  NAME  array.   For  the 
nonlinear  term  given  by  Eq.  (26),  the  number  of  non  zero  coefficients  in 
any  equation  is  no  more  than  25,  the  number  of  combinations  of  seven 
nodes  taken  two  at  a  time  (i.e.  C...  +  C.  .,  j  <  k)  plus  the  seven 
diagonal  terms  C--.  Thus,  the  nonlinear  term  requires  28n  words. 
Each  of  the  (A..)  and  (B-.)  matrices  requires  7n  words.  Total  level  1 
storage  required  is  42na  bytes  plus  7n3  bytes  for  the  NAME  array;  the 
ISTART  array  is  not  required  for  this  modified  compact  storage  scheme. 


The  J  matrix  that  arises  in  this  problem  is  (§A.  .  -  bA.  .  +  CB..  + 

n  ij     ij     ij 


n 


2al  C...Y  ).   The  matrix  can  be  stored  in  compact  form  using  the  same 
k=l  1JK  k 

NAME  array  as  for  (A.  .). 

Three  different  linear  equation  solvers  were  considered,  along  with 
their  associated  storage  schemes  for  J.  The  first  was  the  IMSL  pair 
LUDAPB/LUELPB  for  matrices  in  symmetric  banded  storage  form.  The  half 
bandwidth  for  our  sample  problem  was  12,  thus  in  this  case  12n  =  1320 
words  were  required.  No  additional  working  storage  is  required  by 
LUDAPB  since  it  performs  an  in-place  decomposition  of  J.   In  the 
general  case,  storage  requirements  for  the  J  matrix  are  those  for  a 
symmetric  banded  matrix  as  given  by  Eqs.  (14)  and  (15). 

The  second  equation  solver  used  was  an  iterative  method,  SOR,  for 
which  compact  storage  was  used.  This  required  7n  =  770  words.   In 
general.,  storage  requirements  for  the  J  matrix  in  compact  form  are 
given  by  Eq's  (20  and  (21).   SOR,  of  course,  does  not  require  any 
additional  working  storage. 

The  third  equation  solver  considered  was  the  symmetric  form  of 
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the  Yale  Sparse  Matrix  Package  [4]-  This  required  J  to  be  stored  in  a 
symmetric  form  of  the  compact  storage  described  in  Section  3,  and  for 
our  case  required  399  words  to  store  J,  plus  510  words  to  store  the 
NAME  and  ISTART  arrays.  In  addition,  approximately  1500  words  were 
required  to  store  the  decomposition  of  J,  along  with  the  NAME  and 
ISTART  arrays.  In  general,  storage  for  the  Yale  Sparse  Matrix  Package 
should  be  much  less  than  that  for  the  profile  scheme  given  by 
Eqs  (17)  and  (18),  since   a  reordering  of  rows/columns  to  minimize 
fill-in  during  the  matrix  factorization  is  done. 

The  particular  problem  we  have  used  as  an  example  was  designed  to 
illustrate  the  feasibility  of  using  the  three  different  storage/solution 
schemes,  and  the  computational  times  and  storage  here  are  not  likely 
to  be  representative  of  what  might  happen  in  larger  problems.  In 
particular,  the  relatively  small  bandwidth  favors  the  symmetric  band 
storage  mode  in  computational  effort. 

The  SOR  method  must  converge  yery   rapidly  to  be  competitive  in 
computational  effort,  since  about  7n  operations  are  required  per 
iteration,  whereas  about  2n  n  (after  factorization  of  J)  are  required 
for  solution  with  symmetric  banded  matrices  of  half  bandwidth  n  . 
Somewhat  fewer  operations  are  required  for  the  Yale  Sparse  Matrix 
Package.  For  our  case,  SOR  requires  more  computational  effort  than 
direct  methods  when  the  number  of  iterations  for  convergence  exceeds 
4  (i.e.  when  7nN,  >  2n  n,  where  n  is  12  and  N,  is  the  number  of 
iterations),  although  this  is  offset  by  the  need  to  factor  J  each  time 
it  is  recomputed.  Since  the  solution  oy  of  the  system  J3y  =  p 
method  increments  for  the  corrector  equation,  the  accuracy  requirements 
are  low,  and  SOR  requires  few  iterations  for  convergence.  The  results 
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34.8 

26.2 

47.1 

50.1 

71.0 

57.0 

of  our  example  should  be  observed  with  the  above  considerations  in  mind 
All  times  were  obtained  on  the  IBM  360  model  67,  using  the  Fortran  H 
compiler. 


rms  accuracy 

required  in  UUDAPB/UJ  ELP3  YALE  SOR 

Gears  Method 

.1  30.0 

.01  48.7 

.001  66.6 


Table  1 
For  systems  with  large  bandwidths,  we  expect  the  computational 
effort  required  for  both  the  Yale  Sparse  Matrix  Package  and  SOR  to  be 
superior  to  the  symmetric  banded  scheme. 
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